Deep transfer learning for COVID-19 fake news detection in Persian

Masood Ghayoomi; Maryam Mousavian

doi:10.1111/exsy.13008

Deep transfer learning for COVID-19 fake news detection in Persian

Expert Syst. 2022 Apr 3:e13008. doi: 10.1111/exsy.13008. Online ahead of print.

Authors

Masood Ghayoomi¹, Maryam Mousavian²

Affiliations

¹ Faculty of Linguistics Institute for Humanities and Cultural Studies Tehran Iran.
² Computer Engineering Department Amirkabir University of Technology Tehran Iran.

Abstract

The spread of fake news on social media has increased dramatically in recent years. Hence, fake news detection systems have received researchers' attention globally. During the COVID-19 outbreak in 2019 and the worldwide epidemic, the importance of this issue becomes more apparent. Due to the importance of the issue, a large number of researchers have begun to collect English datasets and to study COVID-19 fake news detection. However, there are a large number of low-resource languages, including Persian, that cannot develop accurate tools for automatic COVID-19 fake news detection due to the lack of annotated data for the task. In this article, we aim to develop a corpus for Persian in the domain of COVID-19 where the fake news is annotated and to provide a model for detecting Persian COVID-19 fake news. With the impressive advancement of multilingual pre-trained language models, the idea of cross-lingual transfer learning can be proposed to improve the generalization of models trained with low-resource language datasets. Accordingly, we use the state-of-the-art deep cross-lingual contextualized language model, XLM-RoBERTa, and the parallel convolutional neural networks to detect Persian COVID-19 fake news. Moreover, we use the idea of knowledge transferring across-domains to improve the results by using both the English COVID-19 dataset and the general domain Persian fake news dataset. The combination of both cross-lingual and cross-domain transfer learning has outperformed the models and it has beaten the baseline by 2.39% significantly.

Keywords: COVID‐19; contextualized text representation; deep neural network; fake news detection; transfer learning.

Publication types

News